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A deviant case analysis pilot study analyzed 
California local education agency data to determine the usefulness of 
regression analysis in predicting change in achievement from 1984 to 
1989 and identified outliers or districts that show greater 
achievement changes than would be expected given changed demographic 
conditions. This report on the Successful Indicators Study discusses 
some previously identified statistical and methodological problems 
associated with the use of regression, presents the findings of the 
Pilot study, and recommends alternative methods for selecting case 
study sites. Focus is on developing indicators of conditions and 
programs within a metropolitan school district that predict success. 
Data from the 1980 Census, the 1989-90 California Basic Education 
Data System, and the California Assessment Program tests for the 
school years 1984-85 through 1989-90 are used. Reading and 
mathematics scores for grades 3, 6, and 8 from 1984-85 to 1989-90 
were used. The findings indicate that the regression procedure has 
not helped identify local education agencies that are doing well and 
that have experienced large changes in the demographic conditions 
under which they operate. Use of a combination of qualitative and 
quantitative methods is recommended to identify successful local 
education agencies. Included are six tables and two graphs. (RLC) 
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INTRODUCTION 

The original goal of the Successful Indicators Study (SIS) was to develop indicators of the 
conditions within a community and school district that foster a positive climate for improving the 
achievement of educationally disadvantaged children in the Western region's metropolitan areas. 
The dependent variables consist of various measures of change in student achievement from 1980 
to 1990. The independent variables consist of an array of demographic, fiscal resource, facility 
and community variables, school system organizational features, and indicators of local political 
culture. The combination of these variables represents the conditions under which school program 
responses have proven more or less effective and efficient over the decade. The intervening 
variables consist of programmatic or related efforts, including staff development, specialized staff 
recruitment and assignment practices, and service provisions that have been introduced by the 
communities and the school districts in efforts to improve the life chances, as well as the 
achievement levels, of educationally disadvantaged students. The SIS project also aimed to 
develop models and criteria of effective educational treatment of students in metropoUtan local 
education agencies (LEAs) and to assist interested LEAs and community agencies tr. adapting them 
to local circumstances. 

As originally designed, a regional study was planned to identify fifty districts from the 
umvose of LEAs lying within Metropolitan Statistical Areas (MSAs) in the four state region of the 
Pacific Southwest. Inearly 1991, SWRL staff completed a census of metro-area school districts in 
California. Arizona, Nevada, and Utah. As it became evident that test score and other 
demographic data would be available for all districts in these states, the original research design to 
use only a small sample of districts to identify outliers was expanded to the universe of relevant 
LEAs. 

Although we had originally planned to identify districts in all four state using regression 
analys*. California LEAs accounted for 80% of the LEAs in the region. Thus, we decided to try 
out the procedure for identifying "interesting" LEAs on most of the universe instead of only a 
sample, simply by looking at California LEAs for which computer data were available. 

In our original design, we planned to identify districts that showed rapid growth in numbers 
of educationally disadvantaged students from 198&1990, using 1980 and 1990 Census district 
data. However, there was a delay in the release of the school district level 1990 U.S. Census data 
Because of the lack of district-level 1990 Census data, we looked at alternative sources of data 
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Consequently, in this analysis, we have used the 1980 Census data, the 1989-1990 California 
Basic Education Data System (CBEDS) data and California Assessment Program (CAP) test data 
for the school years 1984-85 and 1989-90. 

The major purpose of the deviant case analysis pifot study is to determine the usefulness of 
regression analysis in predicting change in achievement from 1984 to 1989 and to identify outliers 
or districts that show greater achievement changes than would be expected given changed 
demographic conditions. Our original intention was to use these outliers as intensive case study 
sites. In this report, we discuss some previously identified statistical and methodological problems 
associated with the use of regression and present the findings of the pilot study. We also 
recommend alternative methods for selecting case study sites. 



STATISTICAL AND TESTING ISSUES 

Because the goal of the study is to develop indicators of conditions and programs within a 
school district that predict "success," the major dependent variable consists of a standardized 
measure of change on district-level achievement in math and reading between 1980 and 1990. 
High growth over time was defined as the "success" measure for the regression. 

In the state of California, the CAP tests have been given since 1980, but prior to 1984, the 
CAP tests were sufficiently different so as to be non-comparable. Thus, we used achievement in 
1984-85 through 1989-1990. Although we had planned to look at achievement in grades 3, 6, 8, 
and 12, grade 12 scores had to be deleted as a new grade 12 CAP test was first administered in the 
1987-88 school year and older tests were non-comparable. No reading or math scores were 
available at the eighth grade level until 1984. Thus, we used reading and math scores for grades 3, 
6, and 8 from 1984-85 to 1989-90. 

Some limitations should be placed on interpretations made from the model presented. The 
data set contains aggregated district-level information and not individual level data. Both the 
demographic and the achievement data represent district averages calculated from either household 
(Census data) or school building (CAP and CBEDS) data. By grouping observations and 
estimating parameters based on grouped means, the variation between individual observations is 
lost This may reduce the variation of the grouped data and may also artificially inflate the R 2 
result Basically, what is lost is the information on the variation of observations within groups 
(e.g. schools within districts). The R 2 of the regression equation may be influenced, as the larger 
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the variability of a given sample on the independent variables, the larger the R 2 . Low R 2 values 
reflect a large amount of variation in the dependent variable unexplained by the model. 



Another assumption in regression analysis is mat the model is perfectly specified and there 
are no omitted yet significant predictors. It is likely that mis assumption is incorrect The 
regression model used did not include school environment or program variables that reflect wi thin- 
district differences or that represent a particular district's policies and procedures. Variables such 
as teacher or curriculum quality measures, which may also influence gain in achievement, are not 
included. The school effectiveness literature has shown evidence that school environment, special 
programs, teacher quality, and curriculum quality all have an influence on school-level 
achievement. However, no data on these variables were available. In addition, according to 
Pedhazur (1982), when relevant variables are omitted from the regression equation, and they are 
correlated with the variables remaining in the equation, estimation of the regression coefficients for 
the latter is biased. 

One of the primary assumptions of regression analysis is that the independent variables are 
measured without error. In this study, because the 1980 Census demographic variables of interest 
(population size: 5-17 year olds, percent non-white, percent income below poverty level, and 
percent of 5-17 year olds with poor or no English ability) were not available for 1990, proxies of 
these were used The measures were not strictly comparable. For the variable "income below 
poverty level," in 1980, the Census variable consisted of percent of 5-17 year olds in households 
with income below poverty level (mean was 12.9%). For 1990, the CAP proxy of the 1980 
variable consisted of percent of students in a district who received AFDC (mean was 13%). In 
another example, for 1980, the Census variable of interest was percent of 5-17 year olds with poor 
or no English speaking ability (mean was 3.8%), while in 1990 the CAP proxy used was percent 
of students in grades 3, 6, and 8 who were considered LEP (mean was 8.8%). We do not know 
the relationship between the 1980 variables of interest and the 1990 proxies. Use of such proxies 
in calculation of change measures (e.g. variable at time2 - variable at timel * change) leads to 
measurement error and results in low reliability of measures. Errors of measurement in the 
independent variables in a regression analysts may lead to either an upward or downward bias in 
the estimation of the regression coefficients. 



DEVIANT CASE STUDY ANALYSIS 

The purpose of the deviant case analysis pilot was to identify districts that show greater 
achievement changes than expected in order to identify metropolitan LEAs where more intensive 
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case studies could be done. We axe interested in LEAs that have experienced large demographic 
changes relative to other LEAs in conditions that provide greater challenge for educators (eg. 
increase in numbers of students in poverty). Additionally, we want to look at those LEAs whose 
change in student achievement has been substantially better man historically produced under such 
demographic conditions. 

Demographic data at the LEA level are available on Census Bureau Summary Tape Files 
(STF1F and STF3F). Examining the 1980 census data revealed that many of the LEAs located in 
MS As had a substantial fraction of their population residing in areas classified as rural. Figure 1 is 
a histogram showing the number of LEAs by categories of percent rural population. We decided to 
exclude the 37% of the LEAs that had 50% or more of their population living in rural areas because 
the METRO Center mission is to study metropolitan problems. 

Figure 1 

Number of LEAs by percentage of the population living in rural areas. 



Number of LEAs in Percent Rural Categories 



300 -r 




Percentage of population living In rural areas of LE 



Examination of the CAP achievement data revealed that, in LEAs where few students were 
assessed, scores varied markedly from year to year. Staff at the California Assessment Program 
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confirmed that they were aware of this problem. Since CAP scones also bounce around from year 
to year because of student population changes, the state does not report data for very small LEAs 
and recommends care in interpreting data from small LEAs for which data were reported. To deal 
with the problem of instability of test scores and die undue influence of district size on achievement 
scores and on the creation of outliers, we elected to exclude LEAs when assessment data were 
available for fewer than 100 students for grades 3, 6, and 8. This is about 20% of the California 
districts. Table 1 shows that sixty-two percent of the LEAs are retained for analysis when the 
small or rural LEAs are excluded. 

Table 1 

Number and Percent of Students Assessed in LEA's by Rurality and Size 



Number of students assessed in LEA 



Percent Rural 


Less than 100 


100 or more 


Total 






Number 




50% or more 


145 


100 


245 


Less than 50% 


7 


397 


404 


Total 


152 


497 


649 






Percent 




50% or more 


22 


15 


37 


Less than 50% 


1 


62 


63 


Total 


23 


77 


100 



We had planned to use U.S. Census data to measure the changing conditions under which 
the LEAs were operating. However, data from the 1990 U.S. Census are not yet available at the 
LEA level. Consequently, for this pilot study we have had to use other data sources to locate data 
somewhat comparable to 1980 Census data, CBEDS provides information on LEA enrollment by 
ethnicity and grade in the School Information Form (SIF) data base. California Assessment 
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Program (CAP) data include percentages of limited English Proficient (LEP) students and 
percentages of students receiving Aid to Families with Dependent Children (AFDC). Table 2 
shows the predictor variables used in the regression, along with means. Change in population size 
(POP) is the initial measure divided by the final measure. For the three other variables, change is 
the final measure minus the initial measure. 



Table 2. 

Predictor Variables and Their Components Used in the Regression Analysts 



Variable Initial Measure Final Measure 



Percentage change in 
population size (POP) 



1980 census: number of 1989-90 SIF: teal LEA 

5-17 year olds factored by the enrollment, (n - 6060) 

proportion of school-age 

population represented by the 

grades served by the LEA 

(n=5944). 



Change in % minority (MIN) 



1980 census: % non-white 1989-90 SIF: % non-white 
5-17 year olds, (x * 21 .6%) enrollment (x = 39.9%) 



Change in level of poverty 
(POV) 



1980 census: % of 5-17 year 
olds in households with 
income below poverty level 
(x * 12.9%). 



1987-88 CAP: % of students 
receiving AFDC. (x « 13.2%) 



Change in % with limited 



1980 census: % of 5-17 year 



English speaking ability (LEP) olds with poor or no English 

speaking ability (x « 3.8%). 



1989-90 CAP: % of students 
who are LEP. (x = 8.8%) 



The initial measure of population size is complicated by the fact that an LEA may not serve all 
grades, and therefore its total enrollment is not an accurate representation of the population of 5- 17 
year olds. The U.S. Census data were adjusted to represent the same population as the LEA 
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enrollment. The proportion of students in each grade across all the LEAs in the regression analysis 
was determined. The number of 5-17 year olds from the census was multiplied by the sum of the 
proportions of grades served by the LEA. If an LEA served all grades, the sum of the proportion 
would be one and there would be no adjustment to the census data. 

CAP provided LEA level reading and math achievement data for grades 3, 6, and 8 for 
school years 1984-85 through 1989-90. Both scaled scores and statewide percentile rank were 
reported. For a given year, the LEA's achievement level was the weighted mean of the scores 
across grades 3, 6, and 8 and across reading and math. The weighted mean was computed for 
both the scaled score and the rank. Change in achievement was computed two ways. The first 
way was simply taking the difference between the 1989-90 mean and the 1984-85 mean. The 
second way was to compute the slope of the time series regression of the means across all the years 
from 1984-85 through 1989-90. Thus, four measures of change in achievement were computed. 
The slope of the time series regression of mean rank (RNK) gave the largest multiple correlation 
and is reported in Table 3, along with variables included in the regression, multiple R and R 2 . 

Table 3. 

Regression Analysis of Change in Achievement with Change in Selected Demographic 
Characteristics of Local Educational Agencies 



Dependent variable 

RNK state percentile rank change 1984-89 using slope of time series regression 

Independent variables 

POP percent change in 5-17 year old population 1980 to 1989-90 

MIN* change in 5-17 year olds minority population percentage 1980 to 1989-90 

POV change in 5-17 year olds poverty population percentage 1980 to 1989-90 

LEP change in 5- 1 7 year olds limited English proficient population percentage 

1980 to 1989-90 

* significant at p<.05 

R = .16160 R 2 = 0261 #=4.383 F = 2.56754 p =.0378 
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Regression analysis provides a way to predict the value of one variable from other variables 
theorized to be important predictors. We can identify those LEAs whose actual change in 
achievement was substantially better than would be predicted by examining the residuals produced 
by regression analysis. Those LEAs with large positive residuals would be considered for case 
studies. 

Although the regression was significant, the R 2 was very low (.026). Thus the independent 
variables account for very little of the change in achievement. Only one variable, MIN, or change 
in minority population percentage, was significant. 

The ten LEAs with the largest residual (RESID) are reported in Table 4, along with the values 
of each of the variables in the regression. Table S shows the ten LEAs with the largest positive 
change in achievement (RNK). The same ten LEAs are in both tables with only a slight difference 
in order. The regression procedure does not improve our ability to locate LEAs with relatively 
high gain in achievement in the context of change in demographic conditions. We can and do 
identify the same LEAs simply by looking at achievement. 



Table 4 

LEA'S With Much Higher than Predicted Change in Achievement 



LEA NAME 


RESID 


POP 


MIN 


LEP 


POV 


RNK 


NEWMAN-CROWS LANDING UNIFIED 


4.28 


1.32 


32.49 


-3.11 


-.69 


4.15 


EDISON ELEMENTARY 


3.76 


1.35 


16.37 


2.05 


-1.76 


3.87 


SOUTH BAY UNION ELEMENTARY 


3.29 


1.23 


31.51 


17.53 


-.09 


3.23 


SANTA BARBARA ELEMENTARY 


2.81 


.73 


37.99 


20.84 


-5.12 


2.71 


BEAUMONT UNIFIED 


2.63 


1.10 


17.45 


.90 


9.54 


2.59 


RIVERBANK ELEMENTARY 


2.61 


1.18 


26.16 


19.04 


-.35 


2.62 


RAMONA CITY UNIFIED 


2.56 


1.61 


10.93 


1.69 


-.93 


2.73 


VENTURA UNIFIED 


2.51 


.87 


11.40 


5.74 


-1.01 


2.66 


RIPON UNIFIED 


2.25 


1.29 


12.02 


-2.51 


3.11 


2.34 


MANTECA UNIFIED 


2.17 


1.26 


8.28 


-.27 


-.64 


2.35 
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Tables. 

LEA'S With the Highest Change in Achievement 



LEA NAME RNK 

NEWMAN-CROWS LANDING UNIFIED 4.15 

EDISON ELEMENTARY 3.87 

SOUTH BAY UNION ELEMENTARY 3.23 

RAMONA CITY UNIFIED 2.73 

SANTA BARBARA ELEMENTARY 2.71 

VENTURA UNIFIED 2.66 

RIVERBANK ELEMENTARY 2.62 

BEAUMONT UNIFIED 2.59 

MANTECA UNIFIED 2.35 

RIPON UNIFIED 2.34 



Table 6 presents the same set of LEAs but contains z- scores of the regression variables. This 
gives a clearer picture of how demographic conditions within the LEA varied from their respective 
means. In eighty percent of the cases, z score values of the independent variable are less than one 
standard deviation from the mean. Additionally, there are both positive and negative z scores. 
Clearly the cases identified by the regression analysis are not at the extremes of the changes in 
demographic conditions experienced by the LEAs. 
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Table 6. 

LEA'S With the Highest Change in Achievement 



LEA NAME ZPOP 
NEWMAN-CROWS LANDING UNIFIED .28 

EDISON ELEMENTARY .33 

SOUTH BAY UNION ELEMENTARY .14 

SANTA BARBARA ELEMENTARY -.62 

BEAUMONT UNIFIED -.05 

RIVERB ANK ELEMENTARY .07 

RAMONA COY UNIFIED .73 

VENTURA UNIFIED -.40 

RIPON UNIFIED .23 

MANTECA UNIFIED .20 



ZMIN 


ZLF.P 


ZPOV 


ZRNK 


ZPRED 


1.00 


* \ 


-.17 


4.05 


.97 


- 26 




- 30 


3 78 


- 36 


.92 


1.28 


-.10 


3.15 


.63 


1.42 


1.69 


-.71 


2.64 


.84 


-.17 


-.73 


1.07 


2.52 


.48 


.50 


1.47 


-.13 


2.55 


.21 


-.68 


-.64 


-.20 


2.66 


-.75 


-.64 


-.15 


-.21 


2.58 


-.64 


-.60 


-1.15 


.29 


2.27 


-.29 


*«89 


-.88 


-.16 


2.28 


-.83 



The last variable in the Table 6 (ZPRED) is the z- score of the predicted value of the 
regression. All of the predicted values are less than one standard deviation from the mean. This 
indicates that the LEAs with the highest gain in achievement are in the middle of the distribution in 
terms of the changes in demographic conditions under which they operate. The scatter plot in 
Figure 2 further illustrates this point, showing with open circles the ten high LEAs previously 
identified. As can be seen, these LEAs are located in the middle of the distribution of composite 
change in demographic conditions (along horizontal axis). Hie regression procedure has not 
helped us identify LEAs that are doing well and that have experienced large change in the 
demographic conditions under which they operate. The identified LEAs are in the mijj&ls of the 
change-in -demographics continuum 
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Figure 2 

Change in Achievement as a function of change in population, percentage ncn- white, 
percentage poverty, and percentage limited English speaking. 
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CONCLUSIONS AND RECOMMENDATIONS 

The pilot study has analyzed California LEA data to identify outliers or districts that show 
greater achievement change than would be expected, based on their demographic changes and 
conditions. The pilot also aimed to ascertain the viability and validity of using regression to 
identify outliers. 

The multiple regression, although significant, had an extremely low R2 value, and thus has 
little practical significance. The procedure did identify some outliers based upon regression 
residuals, but the same outliers would have been identified simply by examining achievement 
alone. Moreover, LEAs with the highest gains in achievement were found to be located in the 
middle of the distribution in terms of demographic conditions such as change in minority 
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population or change in poverty. The regression procedure does not help us locate LEAs that 
exhibit both relatively high gains in achievement and large change in demographic conditions. 

Reliance on regression procedures to identify outliers to enable selection of case study sites 
seems inadvisable. A combination of qualitative and quantitative methods may be more productive. 
To deal with the problems identified in mis pilot study, several options are possible. First, asking 
district and state level officials for suggestions on possible case study LEAs is one option. 
Second, selection of LEAs based on level of changes in population and poverty from 1980-90 is 
another possibility. To identify LEAs with large demographic change that also show change in 
student achievement higher than expected, the following procedures are suggested: 

1. Create a composite change variable (1980-90 using POP, MTN, LEP and POV, and 
calculate for urban LEAs in sample (could use 1980 Census and 1990 CAP and CBEDS 
or use 1984-85 and 1989-90 CAP data). Alternatively, we could use MIN alone, as this 
was the only significant predictor in the regression. 

2. Sort LEAs by level of demographic change. 

3. Identify LEAs scoring in the top 10% (or other fraction) of demographic change. 

4. Within that pool, order LEAs by number of CAP scale score points increased from 1985 - 
1990 (NOTE: we may want to just use grade 8. A state of California study on eighth 
grade achievement found that CAP scale scores increased on average 17 points for grade 
8 from 1985-86 to 1989-90. For the CAP, 18 scale points gain equals one-half grade 
level). 

5. Identify districts Cm the pool of LEAs that are above average on demographic change) 
that score above average and use these to choose possible case study sites. 

To identify high growth high school LEAs, it may be useful to use CAP data. Once districts 
with high composite growth have been identified (as per the previously described procedure), the 
percentage meeting various quality indicator performance levels identified by the state of California 
can be calculated. Indicators for grade 12 that are used by the state in its California Performance 
Reports include percent reaching commendable level and above on CAP reading, math, and writing 
tests; geometry completion, four or more years of English, and dropout rate. An average 
performance value, which is a weight rd average of all of the quality indicator values, can be 
interpreted as a value that reports the percentage of students who, on the average, across the quality 
indicators, perform at or above the established standards. Use of these indicators would enable us 
to explore "high achievement" in a broader way than if we only use CAP scores. 
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Another option that could be used to identify districts far case studies is to create lists of 
districts ordered by their rank on demographic variables (e.g. POP, MIN, POV and LHP), along 
with state percentile rank change from 1984-89 (using the slope of time series regression). The 
lists, which could be sorted by individual variables, would also include district values on the other 
variables of interest The top 10 districts could be identified, or the lists could be examined to find 
"interesting" LEAs. 

To deal with other states in the region, other soategies will be necessary. In the case of 
Nevada, only two LEAs are of ; merest: Washoe County and Clark County. Examination of state 
reports shows that test scones have been declining in Clark County, while the change in the 
minority population has increased. Test scores are improving slightly in Washoe County, and 
there has also been an increase in the minority population there. 

In Arizona, there are approximately 50 LEAs in the Phoenix and Tucson MS As. Although 
the TTBS test has been given there for ten years or more, the test was re-normed after 1986. Thus, 
the scores are not comparable from 1984-86 to 1987 and later. Thus, it is recommended that 
achievement from 1987-1990 be calculated for selected districts. Relevant demographic data are 
available by district (e.g. LEF, free lunch, mobility, percent minority), thus a composite change 
could also be calculated as for California. 

In Utah, the state administered the Stanford Achievement test from 1985-1989 in grades 3, 5, 
7, and 1 L We have a recent report that contains der^ graphic data for 1990 by districts (e.g. 
percent tested, percent of students receiving free lunch, AFDC and foster care, and median 1990 
test scores) , and data for earlier years are available from the individual districts. 

In conclusion, the findings of the pilot study indicate that the regression procedure has not 
helped us to identify LEAs that are doing well and mat have experienced large change in the 
demographic conditions under which they operate. We recommend using a combination of 
qualitative and quantitative methods to identify LEAs and have presented several options. 
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